Method for discovering a biomarker

ABSTRACT

The invention relates to a method for discovering biomarkers, comprising: matching the expression levels of genetic factors in persons, including a plurality of patients having a specific disease, for each of the persons; and comparing the expression levels of the genetic factors and genes corresponding thereto by any one or more of cluster analysis and correlation analysis to select some of the genetic factors. According to the invention, highly accurate biomarkers for a specific disease can be discovered in a simple and easy manner.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a method for discovering biomarkers,and more particularly, to a method of simply and easily discoveringhighly accurate biomarkers for a specific disease by comparing theexpression levels of genetic factors and genes corresponding thereto byanalysis of any one or more of cluster analysis and correlationanalysis.

2. Description of the Prior Art

Breast cancer is a heterogeneous disease with respect to clinicalbehavior and response to therapy. This variability is a result of thediffering molecular make-up of cancer cells within each subtype ofbreast cancer. However, only two molecular characteristics are currentlybeing exploited as therapeutic targets. These are estrogen receptor (ER)and HER2, which are targets of antiestrogens (tamoxifen and aromataseinhibitors) and HERCEPTIN®, respectively. Efforts to target these twomolecules have proven to be extremely productive. Nevertheless, thosetumors that do not have these two targets are often treated withchemotherapy, which generally targets proliferating cells.

Since some important normal cells are also proliferating, they aredamaged by chemotherapy at the same time. Therefore, chemotherapy isassociated with severe toxicity. Identification of molecular targets intumors in addition to ER or HER2 is critical in the development of newanticancer therapy.

Thus, it can be seen that the development and progression of cancer isnot caused by some specific genes, but results from the complexinteraction of many genes which are involved in various signalingmechanisms and regulatory mechanisms which occur during the progressionof cancer. Accordingly, studies on the mechanisms of cancer formation,focused on some specific genes, are very limited studies. Thus, newgenes related to cancer need to be identified by comparatively analyzingthe expression levels of a large amount of genes between normal cellsand cancer cells.

SUMMARY OF THE INVENTION

Accordingly, the present invention has been made in view of the problemsoccurring in the prior art, and it is an object of the present inventionto discover a highly accurate biomarker for a specific disease in asimple and easy manner.

To achieve the above object, the present invention provides a method fordiscovering biomarkers, comprising the steps of: matching the expressionlevels of genetic factors in persons, including a plurality of patientshaving a specific disease, for each of the persons; and comparing theexpression levels of the genetic factors and genes corresponding theretoby analysis of any one or more of cluster analysis and correlationanalysis to select some of the genetic factors.

Herein, the genetic factor is preferably one or more selected from thegroup consisting of chromosomal genes, single nucleotide polymorphisms(SNPs), copy-number variations (CNVs) and micro-RNAs (miRNAs).

In one embodiment of the present invention, matching the expressionlevels of the genetic factors for each of the persons may be performedby matching the expression levels of genes on the chromosome of theplurality of patients having the specific disease for each of thepatients, and the analysis of any one or more may comprise the steps ofselecting information about genes related to the specific disease fromamong the genes; analyzing the expression patterns of the selected genesin the patients according to the type of the disease; and clustering thegenes according to the expression patterns.

Herein, selecting only the information about genes related to thespecific disease from among the genes may be performed by selecting onlyinformation about genes known to be related to the specific disease.

Also, analyzing the expression patterns of the selected genes in thepatients according to the type of the disease may be performed bydividing the expression patterns of the genes in the patients accordingto the disease type into two or more levels.

Moreover, the step of clustering the genes according to the expressionpatterns preferably comprises a step of selecting only genes which maybe clustered according to the expression patterns, and selecting theselected genes as markers related to subtyping of the specific disease.

In another embodiment of the present invention, matching the expressionlevels of the genetic factors for each of the persons may be performedby matching the expression levels of single nucleotide polymorphisms(SNPs) and genes on the chromosomal of the plurality of patients havingthe specific disease for each of the patients, and the analysis of anyone of more may comprise the steps of selecting a copy-number variation(CNV) region in which the expression levels of the SNPs are higher orlower than a specific reference value, and selecting CNVs present oneffective at the location on the chromosome of the CNV region; andperforming correlation analysis of the expression levels of the selectedCNVs and genes corresponding thereto on the chromosomes of the patientsto select genes showing positive (+) correlation.

Herein, the effective genes are preferably sequences containing geneticinformation.

Also, selecting the CNVs may be performed by selecting a CNV region inwhich the expression levels of the SNPs are higher than a firstreference value or lower than a second reference value, and selectingCNVs present on sequences containing genetic information at the locationon the chromosome of the CNV region.

In still another embodiment, matching the expression levels of thegenetic factors for each of the persons may be performed by matching theexpression levels of micro-RNAs (miRNAs) and genes in the persons,including the plurality of patients having the specific decrease, foreach of the persons, and the analysis of any one or more may comprise astep of performing correlation analysis of the miRNAs and genescorresponding thereto to select genes showing negative (−) or positive(+) correlation, and selecting genes corresponding to miRNAs related tothe specific disease from among the selected genes showing negative (−)or positive (+) correlation.

Herein, the miRNAs related to the specific disease are preferably miRNAsknown to be related to the specific disease.

In still another embodiment of the present invention is directed to amethod for discovering biomarkers by mechanism analysis, the methodcomprising the steps of

classifying genes, belonging to a candidate gene group suitable for useas biomarkers of disease, as a group related to the mechanism of actionof a specific disease; and

comparing the expression levels of genes of the classified group in aplurality of patient groups having the specific disease and a normalperson group to select genes which are expressed more highly in thepatient groups.

Herein, the candidate gene group preferably includes genes obtained bythe above biomarker discovery method.

Also, the candidate group includes genes obtained by the method fordiscovering biomarkers for subtyping, genes obtained by the method ofdiscovering copy-number variations (CNVs), and genes obtained by themethod of discovering biomarkers by micro-RNA (miRNAs).

Further, classifying the genes belonging to the candidate gene group asthe group related to the mechanism of action of the specific disease maybe performed by comparing the expression levels of genes between theplurality of patient groups having the specific disease and the normalperson group to select a mechanism of action of a disease, includinggenes which are expressed more highly in the patient groups, as a grouprelated to be the mechanism of action of the specific disease.

In addition, selecting the genes which are expressed more highly in thepatient groups having the specific disease may be performed by selectingthe genes, which are more highly expressed in the patient groups, byperforming T-test for the patient groups having the specific disease andthe normal person group.

Moreover, comparing the expression levels of genes of the classifiedgroup to select genes which are expressed more highly in the patientgroups is preferably performed by first performing T-test for genes ofthe classified group, which have high expression levels, to select geneswhich are more highly expressed in the patient groups.

Still another embodiment of the present invention is directed to breastcancer-related biomarkers including genes shown in Table 1.

Also, the present invention is directed to biomarkers allowing theidentification of subtypes of breast cancer.

In addition, the present invention is directed to a breast cancer testkit comprising: a microarray including probes corresponding to thebiomarkers; and an optical measurement device for measuring changes inexpressions of the genes.

Details of other embodiments are included in the detailed descriptionand the accompanying drawings:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an example of a matching table showing the expression levelsof genes in each patient, which is used in a method for discoveringbiomarkers for subtyping according to a preferred embodiment of thepresent invention.

FIG. 2 is an example of the expression pattern of each gene in a patientaccording to each disease type.

FIG. 3 is a table showing an example of genes clustered to theexpression pattern of FIG. 2.

FIG. 4 is an example of a matching table showing the expression levelsof single nucleotide polymorphisms (SNPs) in each patient, which is usedin a method of discovering by copy-number variations (CNVs) according toa preferred embodiment of the present invention.

FIG. 5 is an example of a chromosome in which a CNV region selected fromthe expression levels of SNPs of FIG. 4 and a CNV region includingeffective genes are shown.

FIG. 6 is a graph showing an example of correlation analysis of theexpression levels of CNV of FIG. 4 and a gene corresponding thereto.

FIG. 7 is an example of a matching table showing the expression levelsof micro-RNAs (miRNA) in each patient, which is used in a method ofdiscovering biomarkers by miRNAs according to a preferred embodiment ofthe present invention.

FIG. 8 is a graph showing an example of correlation analysis of theexpression levels of the miRNA of FIG. 7 and a gene correspondingthereto.

FIG. 9 is an example of genes for each mechanism, which illustratesmechanism analysis which is used in a method of discovering biomarkersby mechanism analysis according to a preferred embodiment of the presentinvention.

FIG. 10 is a table showing an example of the expression levels of genesbelonging to mechanism I of FIG. 9.

FIG. 11 is a table showing an example of the expression levels of genesbelonging to mechanism II of FIG. 9.

FIG. 12 is a table showing an example of the expression levels of genesbelonging to mechanism III of FIG. 9.

FIG. 13 is a graph showing an example of accuracy at each significantlevel for biomarkers discovered by a biomarker identification methodaccording to a preferred embodiment of the present invention.

FIG. 14 is an optical photograph showing the results of discovering thesubtypes of breast cancer using biomarkers identified by a biomarkeridentification method according to a preferred embodiment of the presentinvention.

FIG. 15 is a diagram showing a comparison between biomarkers accordingto a preferred embodiment of the present invention and biomarkers ofother companies.

DETAILED DESCRIPTION OF THE INVENTION

The present invention may be modified variously and may have variousembodiments, particular examples of which will be illustrated indrawings and described in detail. However, it should be understood thatthe following exemplifying description is not intended to restrict thepresent invention to specific embodiments, and the present invention ismeant to cover all modifications, equivalents and alternatives which areincluded in the spirit and scope of the present invention. In thefollowing description, the detailed description of related knowntechnology will be omitted when it may obscure the subject matter of thepresent invention.

The terms used in the present specification are used only to describespecific embodiments, and are not intended to limit the presentinvention. Singular expressions may include the meaning of pluralexpressions as long as there is no definite difference therebetween inthe context. In the present application, it should be understood thatterms such as “include” or “have”, are intended to indicate thatproposed features, numbers, steps, operations, components, parts, orcombinations thereof exist, and the probability of existence or additionof one or more other features, steps, operations, components, parts orcombinations thereof is not excluded thereby.

Terms, such as “first” and “second,” can be used to describe variouscomponents, but the components are not limited by the terms. The termsare merely used to distinguish one component from another component.

A method for discovering biomarkers according to the present inventioncomprises the steps: matching the expression levels of genetic factorsin persons, including a plurality of patients having a specific disease,for each of the persons; and comparing the expression expressions of thegenetic factors and genes corresponding thereto by any one or more ofcluster analysis and correlation analysis, thereby selecting some of thegenetic factors.

The present invention is directed to a method for discovering biomarkerswhich are suitable for examining a specific disease on the basis of theexpression levels of genetic factors in patients or persons includingthe patients. The genetic factor may be one or more selected from thegroup consisting of chromosomal genes, single nucleotide polymorphisms(SNPs), copy-number variations (CNVs) and micro-RNAs (miRNAs). In otherwords, the present invention is directed to a method for discoveringhighly accurate biomarkers by the use of genes of patients or persons,CNVs, miRNAs related to a specific disease, or a combination of two ormore thereof.

Specifically, in the method for indentifying biomarkers according to thepresent invention, a step of matching the expression levels in persons,including a plurality of patients having a specific disease, for each ofthe persons, is first performed. For example, genes and the expressionlevels thereof in a plurality of patients or persons can be made intodatabase (see FIG. 1). In addition, it is also possible to match CNVsand the expression levels thereof in a plurality of patients or persons(see the left figure of FIG. 4) or to match miRNAs and the expressionlevels thereof (see the left figure of FIG. 7).

Then, in the present invention, the expression levels of the geneticfactors and genes corresponding thereto are compared by any one or moreof cluster analysis and correlation analysis, thereby selecting some ofthe genetic factors. This will be described in further detail.

Hereinafter, description will be made by way of example of breast canceramong diseases, but it will be obvious to those of ordinary skill in theart that the present invention is not limited thereto and can be appliedto all diseases.

FIG. 1 is an example of a matching table showing the expression levelsof genes in each patient, which is used in a method for discoveringbiomarkers for subtyping according to one embodiment of the presentinvention; FIG. 2 is an example of the expression level of each gene ofFIG. 1 in patients according to each disease type; and FIG. 3 is a tableshowing an example of genes clustered according to the expressionpattern of FIG. 2.

The method for discovering biomarkers for subtyping according to thepresent invention comprises the steps of: matching the expression levelsof genes on the chromosome of in a plurality of patients having aspecific disease for each of the patients, and selecting onlyinformation about specific disease-related genes from among the abovegenes; analyzing the expression patterns of the genes in the patientsaccording to the type of the disease; and clustering the genes accordingto the expression pattern.

This invention is directed to a method of using the patient's genes asgenetic factors and analyzing the expression levels of the genes,thereby identifying biomarkers. This invention makes it possible todiscover biomarkers by which even the subtypes of a specific disease canbe identified.

In the method for discovering biomarkers for subtyping according to thepresent invention, as shown in FIG. 1, a step of matching the expressionlevels of genes on the chromosome of a plurality of patients having aspecific disease for each of the patients is first performed. That is,the expression levels of some or all genes in each patient are mapped.Herein, the patients may be classified according to the type of disease,and the order of the patients is not critical. Because such patient'sgenes also include genes which are not related with the specificdisease, a step of selecting only information about specificdisease-related genes among the above genes may then be performed. Forexample, if the number of genes of each patient is about 30,000,information on breast cancer-related genes is extracted. Selecting onlyinformation about specific disease-related genes as described above maybe performed using information about genes known to be related to thespecific disease. Based on 327 information obtained from patients,papers, patents, studies information and the like which are related tobreast cancer, the present inventors selected 866 genes related tobreast cancer. Herein, matching the expression levels of genes in eachpatient and selecting only information about specific disease-relatedgenes among the genes may be performed in any order or simultaneously.

In the method for discovering biomarkers for subtyping according to thepresent invention, as shown in FIG. 2, a step of analyzing theexpression levels of the genes in the patients according to the diseasetype is then performed. That is, the expression patterns of specificgenes in the patients according to each disease type are analyzed, andin this analysis, the expression patterns of the genes in the patientsaccording to each disease type can be divided into two or more levels.For example, as shown in FIG. 2, the expression patterns of each geneaccording to each disease type can be divided into high and low levels.In the present invention, the expression degree of each gene is notanalyzed, but the expression pattern is analyzed as described above, andgenes can be clustered according to the expression pattern.

In other words, in the method for discovering biomarkers for subtypingaccording to the present invention, a step of clustering genes accordingto the expression pattern as shown in FIG. 3 is subsequently performed.Genes showing the same expression pattern according to the type ofdisease are grouped. Herein, clustering genes according to theexpression pattern is performed by selecting and clustering only geneshaving similar expression patterns, and genes that cannot be clustereddue to different expression patterns are preferably excluded. In fact,the present inventors classified the 866 breast cancer-related genesinto 4 categories according to the expression pattern, and the number ofgenes clustered in this manner was 646. As described above, the presentinvention is characterized in that clustered genes are selected asmarkers related to subtyping of a specific disease, and when theselected genes are used as biomarkers and compared with the expressionpatterns of the genes of interest in a patient, the disease of thepatient can be predicted.

FIG. 4 is an example of a matching table showing the expression levelsof single nucleotide polymorphisms (SNPs) in each patient, which is usedin a method of discovering by copy-number variations (CNVs) according toa preferred embodiment of the present invention; FIG. 5 is an example ofa chromosome in which a CNV region selected from the expression levelsof SNPs of FIG. 4 and a CNV region including effective genes are shown;and FIG. 6 is a graph showing an example of correlation analysis of theexpression levels of CNV of FIG. 4 and a gene corresponding thereto.

A method of indentifying biomarkers by copy-number variations (CNVs)according to the present invention comprises the steps of: matching theexpression level of each of single nucleotide polymorphisms (SNPs) andgenes on the chromosome of a plurality of patients having a specificdisease for each of the patients; selecting a CNV region in which theSNP expression level is higher or lower than a specific reference value,and selecting CNVs present on effective genes at the location on thechromosome of the CHV region; and performing correlation analysis of theexpression levels of the selected CNVs and genes corresponding theretoon the chromosome of the patients to select genes showing positive (+)correlation from among the above genes.

This invention is directed to a method of using SNPs and/or CNVs ofpatients as genetic factors and analyzing copy-number variations (CNVs)according to the expression levels of the genetic factors, therebydiscovering biomarkers. This invention is based on the fact thatspecific disease-related SNPs exist and that the expression levels ofspecific genes including CNVs according to SNPs are directlyproportional to the specific disease.

In the method of discovering biomarkers by copy-number variations (CNVs)according to the present invention, as shown in FIG. 4, a step ofmatching the expression levels of SNPs on the chromosome of a pluralityof patients having a specific disease for each of the patients is firstperformed. Herein, CHVs selected from the SNPs may be CNVs of all thepatients and may also be CNVs related to a specific disease among theCNVs. Such CNVs may include those which are not related to a specificdisease. Thus, a process of selecting CNVs, which can be suitably usedfor analysis or assessment of disease, from among the CNVs, is required.

For this purpose, as shown in FIG. 5, the present invention comprises astep of selecting a CNV region in which the SNP expression level ishigher or lower than a specific reference value, and selecting CNVspresent on effective genes at the location on the chromosome of the CNVregion. That is, because the CNVs according to the present invention arefor patients having a specific disease, disease-related CNVs areselected according to the expression levels thereof, and in order toselect CNVs having particular effects on gene expression from among suchCNVs, CNVs present on sequences containing effective genetic informationare selected according to the locations of CNVs. Herein, selecting theCNVs is preferably performed by selecting CNVs in which the SNPexpression level is equal to or higher than a first reference value orequal to or lower than a second reference value, according tocorrelation of the expression levels of SNPs and genes correspondingthereto. For example, as shown in FIG. 5, the expression levels of SNPspresent on the chromosome 1 (ch. 1) can differ from each other, andamong them, CNVs present on sequences containing effective geneticinformation can be selected according to the locations of SNPs whoseexpression levels are higher or lower than the specific referencevalues.

Then, a step of performing correlation analysis of the expression levelsof the selected CNVs and genes corresponding thereto on the chromosomeof the patients (see the right figure of FIG. 4) to select genes showingpositive (+) correlation is performed. For this purpose, the presentinvention further comprises information about the expression levels ofgenes on the chromosome of patients, and such information is informationabout the expression levels of genes in patients, which have acorrelation with CNVs, and it may be the same as information about theexpression levels of chromosomal genes used in the above method fordiscovering biomarkers for subtyping (see FIG. 1). The correlationanalysis is performed in order to extract those related to geneexpression among the above selected CNVs. That is, as the expressionlevels of CNVs obtained from the SNP expression increase, the expressionlevels of genes related thereto (genes in which the CNVs are located)increase, suggesting that CNVs and genes corresponding thereto have ahigh correlation with disease. On the contrary, if the expressions ofCNVs and genes corresponding thereto have negative (−) correlation orhave no special correlation, the CNVs and the genes correspondingthereto have a low correlation with disease.

In fact, the present inventors found 324 CNV regions from the SNPexpression levels from about one million SNPs, and selected 327 genesaccording to the locations of the CNVs on the chromosome, and alsoselected 73 genes showing positive (+) correlation from the 327 selectedgenes. As described above, the present invention is characterized inthat CNVs related to a specific disease are selected and specific genesrelated thereto are selected as markers. When the selected genes areused as biomarkers and compared with the expression patterns of thegenes of interest in a patient, the disease of the patient can bepredicted.

FIG. 7 is an example of a matching table showing the expression levelsof micro-RNAs (miRNA) in each patient, which is used in a method ofdiscovering biomarkers by miRNAs according to a preferred embodiment ofthe present invention; and FIG. 8 is a graph showing an example ofcorrelation analysis of the expression levels of the miRNA of FIG. 7 anda gene corresponding thereto.

A method of discovering biomarkers by micro-RNAs (miRNAs) according tothe present invention comprises the steps of matching the expressionlevels of miRNAs and genes in a plurality of patients having a specificdisease for each of the patients; and performing correlation analysis ofthe expression levels of the miRNAs and genes corresponding thereto, andselecting genes showing negative (−) or positive (+) correlation, andselecting genes corresponding to specific disease-related miRNAs fromamong the selected genes.

This invention is a method of using patient's miRNAs as genetic factorsand analyzing the expression levels thereof to identify biomarkers.Specific disease-related miRNAs exist and miRNAs act to inhibit theexpressions of genes. Thus, this invention is based on a negative (−)correlation in which the expression levels of the miRNAs are inverselyproportional to the expression levels of specific genes. In addition,because some miRNAs act to increase the expressions of genes, thisinvention is based on a positive (+) correlation in which the expressionlevels of the miRNAs are proportional to the expression levels ofspecific genes related thereto.

In the method of discovering biomarkers by micro-RNAs (miRNAs) accordingto the present invention, as shown in FIG. 7, a step of matching theexpression level of each of miRNAs and genes in a plurality of persons,including patients, for each of the persons, is first performed. Herein,the miRNAs may be total miRNAs of persons and may also be specificdisease-related miRNAs. Such miRNAs may also include those that are notrelated to a specific disease. Thus, a process of selecting miRNAs asbiomarkers, which may be suitably used in analysis or assessment ofdisease, from among such miRNAs, is required.

For this purpose, in the present invention, a step of performingcorrelation analysis of the expression levels of the selected miRNAs andgenes corresponding thereto (see the right figure of FIG. 7), and, forexample, genes showing negative (−) correlation as shown in FIG. 8, andselecting genes corresponding to specific disease-related miRNAs fromamong the selected genes, is performed. That is, because the miRNAsaccording to the present invention are for all persons, includingpatients and normal persons, it is required to select disease-relatedmiRNAs from among such miRNAs, and for this purpose, the specificdisease-related miRNAs can be selected using miRNAs known to be relatedto the specific disease. At the same time, among such miRNAs, miRNAshaving particular effects on gene expression are required to beselected, and for this purpose, correlation analysis is carried out inthe present invention. For correlation analysis, the present inventionfurther comprises information about the expression levels of genes onthe chromosome of patients, and such information is information aboutthe expression levels of genes in patients, which have no correlationwith miRNAs, and it may be the same as information about the expressionlevels of chromosomal genes used in the above method for discoveringbiomarkers for subtyping (see FIG. 1). The correlation analysis isperformed in order to extract those related to gene expression fromamong the above selected miRNAs. That is, as the expression levels ofmiRNAs increase, the expression levels of genes related thereto (genesin which the CNVs are located) become higher or lower than any referencevalue, suggesting that miRNAs and genes corresponding thereto have ahigh correlation with the disease. On the contrary, if the expressionlevels of miRNAs and genes corresponding thereto have a correlationwithin the reference value or have no special correlation, the miRNAsand the genes corresponding thereto have a low correlation with thedisease.

In this invention, selecting genes corresponding to specificdisease-related miRNAs from among the above genes may be performed inany order. For example, it may be performed before correlation analysis.Specifically, the method of discovering biomarkers by micro-RNAsaccording to the present invention may comprises the steps of: matchingthe expression level of each of micro-RNAs (miRNAs) and genes inpersons, including a plurality of patients having a specific disease,for each of the persons; selecting genes corresponding to specificdisease-related miRNAs from among the above genes; and performingcorrelation analysis of the expression levels of the specificdisease-related miRNAs and genes corresponding thereto and selectinggenes showing negative (−) or positive (+) correlation.

In fact, based on 1,265 information obtained from patients, papers,patents, studies information and the like which are related to breastcancer, the present inventors selected 38 miRNAs related to breastcancer and selected 246 genes from genes related to the 38 selectedmiRNAs by negative (−) or positive (+) correlation analysis. Asdescribed above, the present invention is characterized in that specificdisease-related miRNAs are selected and specific genes related theretoare selected as markers. When the selected genes are used as biomarkersand compared with the expression patterns of the genes of interest in apatient, the disease of the patient can be predicted.

FIG. 9 is an example of genes for each mechanism, which illustratesmechanism analysis which is used in a method of discovering biomarkersby mechanism analysis according to a preferred embodiment of the presentinvention; FIG. 10 is a table showing an example of the expressionlevels of genes belonging to mechanism I of FIG. 9; FIG. 11 is a tableshowing an example of the expression levels of genes belonging tomechanism II of FIG. 9; FIG. 12 is a table showing an example of theexpression levels of genes belonging to mechanism III of FIG. 9.

The method of discovering biomarkers by mechanism analysis according tothe present invention comprises the steps of: classifying genes,belonging to a group of candidate genes suitable for use as biomarkersof a disease, as a group related to the action mechanism of a specificdisease; and comparing the expression levels of the genes of theclassified group in a plurality of patient groups and a normal persongroup, and selecting genes which are expressed more highly in thepatient groups.

In this invention, candidate genes are grouped according to therelevance of molecular biological action or function, and biomarkers areselected according to the expressions of the genes of the group.

For this purpose, in the present invention, a step of classifying genes,belonging to a candidate gene group, as a group related to the actionmechanism of a specific disease, is first performed. As used herein, theterm “action mechanism of a specific disease” refers to the relevance ofany one molecular biological action or function. For example, when genesA, B, E and F together perform a molecular biological function relatedto a specific disease, the genes A, B, E and 9 can be classified as onemechanism (or pathway or network) I group as shown in FIG. 9. This stepmay comprise a process of selecting a specific disease-related mechanismfrom a plurality of mechanisms, and this process may be performed byselecting a mechanism including genes showing high expression levelsusing the information about gene expression levels used in the abovegene expression (GE) analysis. That is, classifying genes belonging tothe candidate gene group as a group related to the action mechanism of aspecific disease can be performed by comparing gene expression levelsbetween a plurality of patient groups having a specific disease and anormal person group and selecting a disease action mechanism includinggenes, which are expressed more highly in the patient groups, as a grouprelated to the mechanism of action of the specific disease.

After or simultaneously with or before the above step, a step ofcomparing the expression levels of the genes of the classified group inthe plurality of patient groups having the specific disease and thenormal person group and selecting genes which are expressed more highlyin the patient groups is performed in the present invention. This stepmay be performed by T-test for the plurality of patient groups havingthe specific disease and the normal person group. Specifically, as shownin FIG. 10, when T-test (significant level: 0.01) is performed for genesbelonging to mechanism I in the patient groups and the normal persongroup, genes A, B and F were within the significant level, and thus itappear that there is a significant difference between the patient groupsand the normal group, suggesting that genes A, B and F can be effectivebiomarkers. In comparison with this, the significant level of gene E ishigher than 0.01, and thus gene E cannot be an effective biomarker.According to this principle, in mechanism II of FIG. 11, only genes Land Q can be effective biomarker, and in mechanism III of FIG. 12, anygene cannot be an effective biomarker. Also, mechanism III cannot beclassified as a group related to the mechanism of action of a specificdisease.

As described above, according to T-test on the patient group and thenormal person group, the step of classifying the genes as a grouprelated to the mechanism of action of a specific disease and the step ofselecting genes which are expressed more highly in the patient group canbe performed at the same time.

Moreover, with respect to other characteristics of the presentinvention, the process of comparing the expression levels of the genesof the classified group and selecting genes which are expressed morehighly in the patient group, T-test is first performed for the genes ofthe classified group which have high expression levels, and thus thegenes which are expressed more highly in the patient groups areselected. For example, as shown in FIG. 12, T-test is first performedfor gene E having the highest expression level among genes E, G, P andD, and when the result is confirmed to be the significant level (0.01),T-test for other genes G, P and D does not needed to be performed andthe mechanisms and the genes belonging thereto appear to be unnecessary.

In addition, in the method of discovering biomarkers by mechanismanalysis according to the present invention, the candidate gene grouppreferably includes genes obtained by the above-described biomarkeridentification methods. In this case, more highly accurate biomarkerscan be selected using the method of discovering biomarkers by mechanismanalysis together with the above-described biomarker identificationmethod.

Furthermore, the candidate gene group more preferably includes genesobtained by the method for identification of biomarkers for subtyping,genes obtained by method of discovering biomarkers by copy-numbervariations (CNVs), and genes obtained by the method of discoveringbiomarkers by micro-RNAs (miRNAs). In this case, the highest accuratebiomarkers can be selected using a combination of various biomarkerdiscovery methods on patients and persons.

In fact, as shown in FIG. 9, the present inventors obtained 646 genes bythe method for discovering biomarkers for subtyping, 73 genes by themethod of discovering biomarkers by copy-number variations, and 246genes by the method of discovering biomarkers by micro-RNAs, and then965 candidate genes which did not overlap. In addition, the presentinventors analyzed breast cancer-related mechanisms among 1,340mechanisms, thereby finally selecting 215 genes.

The 215 selected genes are shown in Table 1 below.

TABLE 1 Discovery No Gene symbol Gene function type 1 402 Acacbacetyl-Coenzyme A carboxylase beta GE 2 302 ACADSB acyl-Coenzyme Adehydrogenase, short/branched GE chain 3 272 agl amylo-1,6-glucosidase,4-alpha-glucanotransferase GE 4 461 Ap1g1 adaptor-related proteincomplex 1, gamma 1 GE subunit 5 35 APC adenomatous polyposis coli miRNA6 16 APP amyloid beta (A4) precursor protein miRNA 7 313 aqp1 aquaporin1 (Colton blood group) GE 8 273 AQP3 aquaporin 3 (Gill blood group) GE 9365 Ar androgen receptor GE 10 146 Arf6 ADP-ribosylation factor 6 CNV 11289 Atp7b ATPase, Cu++ transporting, beta polypeptide GE 12 281 AURKAaurora kinase A; aurora kinase A pseudogene 1 GE 13 338 AURKB aurorakinase B GE 14 145 Bad BCL2-associated agonist of cell death CNV 15 39BCL2 B-cell CLL/lymphoma 2 miRNA 16 12 BDNF brain-derived neurotrophicfactor miRNA 17 224 bhlhe40 basic helix-loop-helix family, member e40 GE18 238 BIRC5 baculoviral IAP repeat-containing 5 GE 19 345 BUB1 buddinguninhibited by benzimidazoles 1 homolog GE (yeast) 20 274 BUB1B buddinguninhibited by benzimidazoles 1 homolog GE beta (yeast) 21 423 C3similar to Complement C3 precursor; complement GE component 3;hypothetical protein LOC100133511 22 400 capn3 calpain 3, (p94) GE 23262 cav1 caveolin 1, caveolae protein, 22 kDa GE 24 268 CCNA2 cyclin A2GE 25 405 CCNB1 cyclin B1 GE 26 254 CCNB2 cyclin B2 GE 27 319 CCND1cyclin D1 GE 28 126 CCNE1 cyclin E1 miRNA 29 299 Ccne2 cyclin E2 GE 30351 ccno cyclin O GE 31 211 cct5 chaperonin containing TCP1, subunit 5(epsilon) GE 32 310 CD36 CD36 molecule (thrombospondin receptor) GE 3366 CDC14B CDC14 cell division cycle 14 homolog B (S. cerevisiae) miRNA34 258 cdc20 cell division cycle 20 homolog (S. cerevisiae) GE 35 209CDC25A cell division cycle 25 homolog A (S. pombe) GE 36 53 Cdc42 celldivision cycle 42 (GTP binding protein, miRNA 25 kDa); cell divisioncycle 42 pseudogene 2 37 399 CDC42BPA CDC42 binding protein kinase alpha(DMPK-like) GE 38 54 CDC42P2 cell division cycle 42 (GTP bindingprotein, miRNA 25 kDa); cell division cycle 42 pseudogene 2 39 277 cdc6cell division cycle 6 homolog (S. cerevisiae) GE 40 453 cdca7 celldivision cycle associated 7 GE 41 440 CDCA8 cell division cycleassociated 8 GE 42 222 CDH1 cadherin 1, type 1, E-cadherin (epithelial)GE 43 263 Cdk1 cell division cycle 2, G1 to S and G2 to M GE 44 153CDK11A similar to cell division cycle 2-like 1 (PITSLRE CNV proteins);cell division cycle 2-like 1 (PITSLRE proteins); cell division cycle2-like 2 (PITSLRE proteins) 45 154 Cdk11b similar to cell division cycle2-like 1 (PITSLRE CNV proteins); cell division cycle 2-like 1 (PITSLREproteins); cell division cycle 2-like 2 (PITSLRE proteins) 46 74 CEBPBCCAAT/enhancer binding protein (C/EBP), beta miRNA 47 386 cebpdCCAAT/enhancer binding protein (C/EBP), delta GE 48 297 CENPA centromereprotein A GE 49 300 CENPE centromere protein E, 312 kDa GE 50 315 CENPFcentromere protein F, 350/400ka (mitosin) GE 51 431 CENPN centromereprotein N GE 52 243 CFB complement factor B GE 53 439 CLTC clathrin,heavy chain (Hc) GE 54 212 CP ceruloplasmin (ferroxidase) GE 55 148CTDSP2 similar to hCG2013701; CTD (carboxy-terminal CNV domain, RNApolymerase II, polypeptide A) small phosphatase 2 56 5 CTNNB1 catenin(cadherin-associated protein), beta 1, 88 kDa miRNA 57 306 Cx3cr1chemokine (C—X3—C motif) receptor 1 GE 58 286 CXCL1 chemokine (C—X—Cmotif) ligand 1 (melanoma GE growth stimulating activity, alpha) 59 425cybrd1 cytochrome b reductase 1 GE 60 311 CYP2B6 cytochrome P450, family2, subfamily B, GE polypeptide 6 61 93 dcaf7 WD repeat domain 68 miRNA62 266 DCK deoxycytidine kinase GE 63 418 DST dystonin GE 64 179 E2F1E2F transcription factor 1 miRNA, GE 65 441 E2f5 E2F transcriptionfactor 5, p130-binding GE 66 234 egfr epidermal growth factor receptor(erythroblastic GE leukemia viral (v-erb-b) oncogene homolog, avian) 67201 Erbb2 v-erb-b2 erythroblastic leukemia viral oncogene CNV, GEhomolog 2, neuro/glioblastoma derived oncogene homolog (avian) 68 301Esr1 estrogen receptor 1 GE 69 208 ETS1 v-ets erythroblastosis virus E26oncogene homolog GE 1 (avian) 70 167 F11r F11 receptor CNV 71 48 F2coagulation factor II (thrombin) miRNA 72 499 FABP4 fatty acid bindingprotein 4, adipocyte GE 73 250 Fadd Fas (TNFRSF6)-associated via deathdomain GE 74 292 FEN1 flap structure-specific endonuclease 1 GE 75 395Fermt2 fermitin family homolog 2 (Drosophila) GE 76 314 Fgfr1 fibroblastgrowth factor receptor 1 GE 77 287 Fgfr4 fibroblast growth factorreceptor 4 GE 78 432 FGG fibrinogen gamma chain GE 79 464 FLT1fms-related tyrosine kinase 1 (vascular endothelial GE growthfactor/vascular permeability factor receptor) 80 213 fn1 fibronectin 1GE 81 305 Gas2 growth arrest-specific 2 GE 82 340 GATA3 GATA bindingprotein 3 GE 83 303 gfra1 GDNF family receptor alpha 1 GE 84 502 GMPSguanine monphosphate synthetase GE 85 50 Gna13 guanine nucleotidebinding protein (G protein), miRNA alpha 13 86 394 Gnas GNAS complexlocus GE 87 10 gpD1 glycerol-3-phosphate dehydrogenase 1 (soluble) miRNA88 356 Grb7 growth factor receptor-bound protein 7 GE 89 27 GTF2H1general transcription factor IIH, polypeptide 1, miRNA 62 kDa 90 4 HDAC4histone deacetylase 4 miRNA 91 433 Hhat hedgehog acyltransferase GE 92426 Hjurp Holliday junction recognition protein GE 93 348 HOXB13homeobox B13 GE 94 130 HSD17B12 hydroxysteroid (17-beta) dehydrogenase12 miRNA 95 332 id4 inhibitor of DNA binding 4, dominant negative GEhelix-loop-helix protein 96 228 Ifitm1 interferon induced transmembraneprotein 1 (9-27) GE 97 244 IGF2 insulin-like growth factor 2(somatomedin A); GE insulin; INS-IGF2 readthrough transcript 98 334IKBKB inhibitor of kappa light polypeptide gene enhancer GE in B-cells,kinase beta 99 309 IL18 interleukin 18 (interferon-gamma-inducingfactor) GE 100 295 IL6ST interleukin 6 signal transducer (gp130,oncostatin GE M receptor) 101 245 INS insulin-like growth factor 2(somatomedin A); GE insulin; INS-IGF2 readthrough transcript 102 182IRS1 insulin receptor substrate 1 miRNA, GE 103 60 ITCH itchy E3ubiquitin protein ligase homolog (mouse) miRNA 104 298 ITGA2 integrin,alpha 2 (CD49B, alpha 2 subunit of VLA- GE 2 receptor) 105 346 ITGA7integrin, alpha 7 GE 106 21 Jun jun oncogene miRNA 107 220 JUP junctionplakoglobin GE 108 285 KIF11 kinesin family member 11 GE 109 430 KIF15kinesin family member 15 GE 110 427 kif20a kinesin family member 20A GE111 291 KIF23 kinesin family member 23 GE 112 337 KIF2C kinesin familymember 2C GE 113 434 Klf4 Kruppel-like factor 4 (gut) GE 114 221 KPNA2karyopherin alpha 2 (RAG cohort 1, importin alpha GE 1); karyopherinalpha-2 subunit like 115 336 Krt14 keratin 14 GE 116 227 KRT18 keratin18; keratin 18 pseudogene 26; keratin 18 GE pseudogene 19 117 233 KRT5keratin 5 GE 118 323 krt8 keratin 8 pseudogene 9; similar to keratin 8;keratin 8 GE 119 352 LAMA5 laminin, alpha 5 GE 120 375 lbplipopolysaccharide binding protein GE 121 304 LRP2 low densitylipoprotein-related protein 2 GE 122 519 lzts1 leucine zipper, putativetumor suppressor 1 GE 123 207 Mad2l1 MAD2 mitotic arrest deficient-like1 (yeast) GE 124 283 MAOA monoamine oxidase A GE 125 516 MAOB monoamineoxidase B GE 126 384 MAP1B microtubule-associated protein 1B GE 127 163MAP3K1 mitogen-activated protein kinase 1 CNV 128 275 maptmicrotubule-associated protein tau GE 129 210 mccc2methylcrotonoyl-Coenzyme A carboxylase 2 (beta) GE 130 124 mcl1 myeloidcell leukemia sequence 1 (BCL2-related) miRNA 131 436 Mcm10minichromosome maintenance complex GE component 10 132 240 mcm2minichromosome maintenance complex GE component 2 133 380 MCM4minichromosome maintenance complex GE component 4 134 422 mdm2 Mdm2 p53binding protein homolog (mouse) GE 135 269 med1 mediator complex subunit1 GE 136 390 MED24 mediator complex subunit 24 GE 137 34 MET metproto-oncogene (hepatocyte growth factor miRNA receptor) 138 363 MGLLmonoglyceride lipase GE 139 428 MLF1IP MLF1 interacting protein GE 140276 Mmp9 matrix metallopeptidase 9 (gelatinase B, 92 kDa GE gelatinase,92 kDa type IV collagenase) 141 507 mtss1 metastasis suppressor 1 GE 1429 myb v-myb myeloblastosis viral oncogene homolog miRNA (avian) 143 231MYBL2 v-myb myeloblastosis viral oncogene homolog GE (avian)-like 2 144178 MYC v-myc myelocytomatosis viral oncogene homolog CNV (avian) 145265 myo6 myosin VI GE 146 282 NDC80 NDC80 homolog, kinetochore complexcomponent GE (S. cerevisiae) 147 216 ndrg1 N-myc downstream regulated 1GE 148 454 NFIA nuclear factor I/A GE 149 330 NFIB nuclear factor I/B GE150 471 nfix nuclear factor I/X (CCAAT-binding transcription GE factor)151 307 Nmu neuromedin U GE 152 2 NT5E 5′-nucleotidase, ecto (CD73)miRNA 153 392 Oip5 Opa interacting protein 5 GE 154 429 ORC6L originrecognition complex, subunit 6 like (yeast) GE 155 215 Pak2 p21 protein(Cdc42/Rac)-activated kinase 2 GE 156 326 PEG3 paternally expressed 3;PEG3 antisense RNA (non- GE protein coding); zinc finger, imprinted 2157 214 PGK1 phosphoglycerate kinase 1 GE 158 31 Phkb phosphorylasekinase, beta miRNA 159 424 Pigt phosphatidylinositol glycan anchorbiosynthesis, GE class T 160 520 PIGV phosphatidylinositol glycan anchorbiosynthesis, GE class V 161 150 PIK3CA phosphoinositide-3-kinase,catalytic, alpha CNV polypeptide 162 71 Pik3r1phosphoinositide-3-kinase, regulatory subunit 1 miRNA (alpha) 163 241PLK1 polo-like kinase 1 (Drosophila) GE 164 11 Plxnd1 plexin D1 miRNA165 25 pnp nucleoside phosphorylase miRNA 166 29 POLR2K polymerase (RNA)II (DNA directed) polypeptide miRNA K, 7.0 kDa 167 46 POM121 POM121membrane glycoprotein (rat) miRNA 168 317 PPARG peroxisomeproliferator-activated receptor gamma GE 169 149 PPP6C proteinphosphatase 6, catalytic subunit CNV 170 45 PRIM1 primase, DNA,polypeptide 1 (49 kDa) miRNA 171 255 PRKACB protein kinase,cAMP-dependent, catalytic, beta GE 172 58 PRKCI protein kinase C, iotamiRNA 173 42 pten phosphatase and tensin homolog; phosphatase and miRNAtensin homolog pseudogene 1 174 271 PTTG1 pituitary tumor-transforming1; pituitary tumor- GE transforming 2 175 105 Rab23 RAB23, member RASoncogene family miRNA 176 446 racgap1 Rac GTPase activating protein 1pseudogene; Rac GE GTPase activating protein 1 177 67 RB1 retinoblastoma1 miRNA 178 142 Rbl1 retinoblastoma-like 1 (p107) CNV 179 125 rheb Rashomolog enriched in brain miRNA 180 347 rrm2 ribonucleotide reductase M2polypeptide GE 181 166 rsf1 remodeling and spacing factor 1 CNV 182 260S100A8 S100 calcium binding protein A8 GE 183 235 Sfrp1 secretedfrizzled-related protein 1 GE 184 15 SFRS9 splicing factor,arginine/serine-rich 9 miRNA 185 75 slc30a1 solute carrier family 30(zinc transporter), member 1 miRNA 186 33 SLC35A1 solute carrier family35 (CMP-sialic acid miRNA transporter), member A1 187 451 SLC40A1 solutecarrier family 40 (iron-regulated transporter), GE member 1 188 280slc5a6 solute carrier family 5 (sodium-dependent vitamin GEtransporter), member 6 189 226 SLC7A5 solute carrier family 7 (cationicamino acid GE transporter, y+ system), member 5 190 257 SLC7A8 solutecarrier family 7 (cationic amino acid GE transporter, y+ system), member8 191 407 Smarce1 SWI/SNF related, matrix associated, actin GE dependentregulator of chromatin, subfamily e, member 1 192 230 SMC4 structuralmaintenance of chromosomes 4 GE 193 417 SNRPN small nuclearribonucleoprotein polypeptide N; GE SNRPN upstream reading frame 194 219STAT1 signal transducer and activator of transcription 1, GE 91 kDa 195308 STAT4 signal transducer and activator of transcription 4 GE 196 38tbca tubulin folding cofactor A miRNA 197 288 Tff3 trefoil factor 3(intestinal) GE 198 312 TFRC transferrin receptor (p90, CD71) GE 199 349TGFB2 transforming growth factor, beta 2 GE 200 55 Tgfbr2 transforminggrowth factor, beta receptor II miRNA (70/80 kDa) 201 90 Th1l TH1-like(Drosophila) miRNA 202 205 tk1 thymidine kinase 1, soluble GE 203 1TNFRSF10A tumor necrosis factor receptor superfamily, miRNA member 10a204 252 TNFSF10 tumor necrosis factor (ligand) superfamily, member GE 10205 232 tp53 tumor protein p53 GE 206 259 TRAF4 TNF receptor-associatedfactor 4 GE 207 18 TRAM1 translocation associated membrane protein 1miRNA 208 8 TXNRD1 thioredoxin reductase 1; hypothetical miRNALOC100130902 209 206 Tyms thymidylate synthetase GE 210 261 UBE2Cubiquitin-conjugating enzyme E2C GE 211 47 UGP2 UDP-glucosepyrophosphorylase 2 miRNA 212 40 Vcam1 vascular cell adhesion molecule 1miRNA 213 6 VIM vimentin miRNA 214 217 YWHAZ tyrosine3-monooxygenase/tryptophan 5- GE monooxygenase activation protein, zetapolypeptide 215 279 ZWINT ZW10 interactor GE

In Table 1 above, “No.” means the original number of genes, and“Discovery type” means a method used for discovery of the relevant gene.

Meanwhile, another embodiment of the present invention is directed tobreast cancer-related biomarkers, including the genes shown in Table 1above.

Also, the present invention may be directed to biomarkers, which includethe genes shown in Table 1 above and allow the identification of thesubtypes of breast cancer.

In addition, the present invention may be directed to a breast cancertest kit comprising: a microarray comprising probes corresponding to thegenes shown in Table 1 above; and an optical measurement device formeasuring changes in the expression of the genes.

FIG. 13 is a graph showing an example of accuracy at each significantlevel for biomarkers indentified by a biomarker identification methodaccording to a preferred to embodiment of the present invention. Thepresent inventors constructed 508 probes corresponding to the 215finally selected genes and performed T-test at varying significantlevels of 0,01-0.05. As a result, at a significant level of 0.01, anaccuracy of 94.8% was reached.

FIG. 14 is an optical photograph showing the results of identifying thesubtypes of breast cancer using biomarkers identified by a biomarkeridentification method according to a preferred embodiment of the presentinvention. As can be seen therein, 508 probes showed optical propertiesdifferent between 4 types of breast cancer, suggesting that these probesallow identification of the type of breast cancer.

The biomarkers according to the present invention were compared withbiomarkers of other companies, and the results of the comparison areshown in Table 2 below and FIG. 15. As can be seen in FIG, 15, thebiomarkers according to the present invention partially overlap with thebiomarkers of other companies, but the number of different biomarkersreaches 143.

TABLE 2 Number of Number of Company name genes probes Remarks LGElectronics Co., Ltd. 215 508 GE: 346¹⁾ CNV: 47 miRNA: 162 the KooFoundation Sun 625 783 GE: 783²⁾ Yat-Sen Cancer Center Center(KFSYSCC;Taiwan cancer center) Agendia 80 219 GE: 219²⁾ (the Netherlands)¹⁾Partial overlap between probes. ²⁾only GE data were used in KFSYSCCand Agendia

In addition, the accuracies of the biomarkers of the present inventionand the biomarkers of KFSYSCC (Taiwan) were comparatively analyzedaccording to 4 types of breast cancer. The results of the analysis areshown in Table 3 (KFSYSCC (783 probes, 625 genes)) and Table 4 (LGElectronics (508 probes, 215 genes)).

TABLE 3 Type Sensitivity Specificity Total accuracy (%) Basal 0.98 0.9787.80 HER2 0.85 0.95 Luminal B 0.53 0.95 Luminal A 0.43 0.89

TABLE 4 Type Sensitivity Specificity Total accuracy (%) Basal 0.98 0.9689.80 HER2 0.80 0.95 Luminal B 0.52 0.94 Luminal A 0.89 0.85

As can be seen in Tables 3 and 4 above, a comparative test was performedusing a total of 250 samples and, as a result, the inventive multiplebiomarkers consisting of a relatively small number of genes showed asubtyping accuracy higher than KFSYSCC (Taiwan Cancer Center).

Also, the accuracies of the biomarkers of the present invention and thebiomarkers of Agendia were comparatively analyzed according to 3 typesof breast cancer. The results of the analysis are shown in Table 5(Agendia (219 probes, 80 genes)) and Table 6 (LG Electronics (508probes, 215 genes)).

TABLE 5 Type Sensitivity Specificity Total accuracy (%) Basal 0.98 0.9588.50 HER2 0.85 0.94 Luminal 0.59 0.95

TABLE 6 Type Sensitivity Specificity Total accuracy (%) Basal 0.98 0.9694.13 HER2 0.80 0.95 Luminal 0.91 0.95

As can be seen in Tables 5 and 6, a comparative test was performed usinga total of 250 samples and, as a result, the multiple biomarkers of thepresent invention showed uniform accuracy for each subtype, but themultiple biomarkers of Agendia showed significantly low accuracy inluminal type prediction.

As described above, according to the present invention, highly accuratebiomarkers for a specific disease can be identified in a simple and easymanner by comparing the expression levels of genetic factors and genescorresponding thereto by any one or more of cluster analysis andcorrelation analysis.

Although the preferred embodiments of the present invention have beendescribed for illustrative purposes, those skilled in the art willappreciate that various modifications, additions and substitutions arepossible, without departing from the scope and spirit of the inventionas disclosed in the accompanying claims.

What is claimed is:
 1. A method for discovering biomarkers, comprisingthe steps of: matching the expression levels of genetic factors inpersons, including a plurality of patients having a specific disease,for each of the persons; and comparing the expression levels of thegenetic factors and genes corresponding thereto by any one or more ofcluster analysis and correlation analysis to select some of the geneticfactors.
 2. The method of claim 1, wherein the genetic factor is one ormore selected from the group consisting of chromosomal genes, singlenucleotide polymorphisms (SNPs), copy-number variations (CNVs) andmicro-RNAs (miRNAs).
 3. The method of claim 1, wherein matching theexpression levels of the genetic factors for each of the persons isperformed by matching the expression levels of genes on the chromosomeof the plurality of patients having the specific disease for each of thepatients, and the analysis of any one or more comprises the steps ofselecting information about genes related to the specific disease fromamong the genes; analyzing the expression patterns of the selected genesin the patients according to the type of the disease; and clustering thegenes according to the expression patterns.
 4. The method of claim 3,wherein selecting only the information about genes related to thespecific disease from among the genes is performed by selecting onlyinformation about genes known to be related to the specific disease. 5.The method of claim 3, wherein analyzing the expression patterns of theselected genes in the patients according to the type of the disease isperformed by dividing the expression patterns of the genes in thepatients according to the disease type into two or more levels.
 6. Themethod of claim 3, wherein the step of clustering the genes according tothe expression patterns comprises a step of selecting only genes whichmay be clustered according to the expression patterns, and selecting theselected genes as markers related to subtyping of the specific disease.7. The method of claim 1, wherein matching the expression levels of thegenetic factors for each of the persons is performed by matching theexpression levels of single nucleotide polymorphisms (SNPs) and genes onthe chromosomal of the plurality of patients having the specific diseasefor each of the patients, and the analysis of any one of more comprisesthe steps of: selecting a copy-number variation (CNV) region in whichthe expression levels of the SNPs are higher or lower than a specificreference value, and selecting CNVs present on effective genes at thelocation on the chromosome of the CNV region; and performing correlationanalysis of the expression levels of the selected CNVs and genescorresponding thereto on the chromosomes of the patients to select genesshowing positive (+) correlation.
 8. The method of claim 7, wherein theeffective genes are sequences containing genetic information.
 9. Themethod of claim 7, wherein selecting the CNVs is performed by selectinga CNV region in which the expression levels of the SNPs are higher thana first reference value or lower than a second reference value, andselecting CNVs present on sequences containing genetic information atthe location on the chromosome of the CNV region.
 10. The method ofclaim 1, wherein matching the expression levels of the genetic factorsfor each of the persons is performed by matching the expression levelsof micro-RNAs (miRNAs) and genes in the persons, including the pluralityof patients having the specific decrease, for each of the persons, andthe analysis of any one or more comprises a step of performingcorrelation analysis of the miRNAs and genes corresponding thereto toselect genes showing negative (−) or positive (+) correlation, andselecting genes corresponding to miRNAs related to the specific diseasefrom among the selected genes showing negative (−) or positive (+)correlation.
 11. The method of claim 10, wherein the miRNAs related tothe specific disease are miRNAs known to be related to the specificdisease.
 12. A method for discovering biomarkers by mechanism analysis,the method comprising the steps of: classifying genes, belonging to acandidate gene group suitable for use as biomarkers of disease, as agroup related to the mechanism of action of a specific disease; andcomparing the expression levels of genes of the classified group in aplurality of patient groups having the specific disease and a normalperson group to select genes which are expressed more highly in thepatient groups.
 13. The method of claim 12, wherein the candidate genegroup includes genes obtained by the method of claim
 1. 14. The methodof claim 12, wherein the candidate group includes genes obtained by themethod of claim 3, genes obtained by the method of claim 7, and genesobtained by the method of claim
 10. 15. The method of claim 12, whereinclassifying the genes belonging to the candidate gene group as the grouprelated to the mechanism of action of the specific disease is performedby comparing the expression levels of genes between the plurality ofpatient groups having the specific disease and the normal person groupto select a mechanism of action of a disease, including genes which areexpressed more highly in the patient groups, as a group related to bethe mechanism of action of the specific disease.
 16. The method of claim12, wherein selecting the genes which are expressed more highly in thepatient groups having the specific disease is performed by selecting thegenes, which are more highly expressed in the patient groups, byperforming T-test for the patient groups having the specific disease andthe normal person group.
 17. The method of claim 12, wherein comparingthe expression levels of genes of the classified group to select geneswhich are expressed more highly in the patient groups is performed byfirst performing T-test for genes of the classified group, which havehigh expression levels, to select genes which are more highly expressedin the patient groups.
 18. Breast cancer-related biomarkers includinggenes shown in Table
 1. 19. The biomarkers of claim 18, wherein thebiomarkers allow identification of subtypes of breast cancer.
 20. Abreast cancer test kit comprising: a microarray including probescorresponding to the biomarkers of claim 18; and an optical measurementdevice for measuring changes in expressions of the genes.